Introduction¶

This dataset was scraped from nextspaceflight.com and includes all the space missions since the beginning of Space Race between the USA and the Soviet Union in 1957!

Install Package with Country Codes¶

In [1]:
%pip install iso3166
Requirement already satisfied: iso3166 in c:\users\manda\anaconda3\lib\site-packages (2.1.1)
Note: you may need to restart the kernel to use updated packages.

Upgrade Plotly¶

Run the cell below if you are working with Google Colab.

In [2]:
%pip install --upgrade plotly
Requirement already satisfied: plotly in c:\users\manda\anaconda3\lib\site-packages (5.11.0)
Requirement already satisfied: tenacity>=6.2.0 in c:\users\manda\anaconda3\lib\site-packages (from plotly) (8.0.1)
Note: you may need to restart the kernel to use updated packages.
In [3]:
%pip install country_converter
Requirement already satisfied: country_converter in c:\users\manda\anaconda3\lib\site-packages (0.8.0)
Requirement already satisfied: pandas>=1.0 in c:\users\manda\anaconda3\lib\site-packages (from country_converter) (1.4.2)
Requirement already satisfied: pytz>=2020.1 in c:\users\manda\anaconda3\lib\site-packages (from pandas>=1.0->country_converter) (2021.3)
Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\manda\anaconda3\lib\site-packages (from pandas>=1.0->country_converter) (2.8.2)
Requirement already satisfied: numpy>=1.18.5 in c:\users\manda\anaconda3\lib\site-packages (from pandas>=1.0->country_converter) (1.21.5)
Requirement already satisfied: six>=1.5 in c:\users\manda\anaconda3\lib\site-packages (from python-dateutil>=2.8.1->pandas>=1.0->country_converter) (1.16.0)
Note: you may need to restart the kernel to use updated packages.

Import Statements¶

In [4]:
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns
import country_converter as coco

# These might be helpful:
from iso3166 import countries
from datetime import datetime, timedelta

Notebook Presentation¶

In [5]:
pd.options.display.float_format = '{:,.2f}'.format

Load the Data¶

In [6]:
df_data = pd.read_csv('mission_launches.csv')

Preliminary Data Exploration¶

  • What is the shape of df_data?
  • How many rows and columns does it have?
  • What are the column names?
  • Are there any NaN values or duplicates?
In [7]:
df_data.shape
Out[7]:
(4324, 9)
In [8]:
df_data.columns
Out[8]:
Index(['Unnamed: 0.1', 'Unnamed: 0', 'Organisation', 'Location', 'Date',
       'Detail', 'Rocket_Status', 'Price', 'Mission_Status'],
      dtype='object')

Data Cleaning - Check for Missing Values and Duplicates¶

Consider removing columns containing junk data.

In [9]:
df_data.isna().values.any()
Out[9]:
True
In [10]:
df_data.duplicated().values.any()
Out[10]:
False
In [11]:
df_data.head()
Out[11]:
Unnamed: 0.1 Unnamed: 0 Organisation Location Date Detail Rocket_Status Price Mission_Status
0 0 0 SpaceX LC-39A, Kennedy Space Center, Florida, USA Fri Aug 07, 2020 05:12 UTC Falcon 9 Block 5 | Starlink V1 L9 & BlackSky StatusActive 50.0 Success
1 1 1 CASC Site 9401 (SLS-2), Jiuquan Satellite Launch Ce... Thu Aug 06, 2020 04:01 UTC Long March 2D | Gaofen-9 04 & Q-SAT StatusActive 29.75 Success
2 2 2 SpaceX Pad A, Boca Chica, Texas, USA Tue Aug 04, 2020 23:57 UTC Starship Prototype | 150 Meter Hop StatusActive NaN Success
3 3 3 Roscosmos Site 200/39, Baikonur Cosmodrome, Kazakhstan Thu Jul 30, 2020 21:25 UTC Proton-M/Briz-M | Ekspress-80 & Ekspress-103 StatusActive 65.0 Success
4 4 4 ULA SLC-41, Cape Canaveral AFS, Florida, USA Thu Jul 30, 2020 11:50 UTC Atlas V 541 | Perseverance StatusActive 145.0 Success
In [12]:
df_data.tail()
Out[12]:
Unnamed: 0.1 Unnamed: 0 Organisation Location Date Detail Rocket_Status Price Mission_Status
4319 4319 4319 US Navy LC-18A, Cape Canaveral AFS, Florida, USA Wed Feb 05, 1958 07:33 UTC Vanguard | Vanguard TV3BU StatusRetired NaN Failure
4320 4320 4320 AMBA LC-26A, Cape Canaveral AFS, Florida, USA Sat Feb 01, 1958 03:48 UTC Juno I | Explorer 1 StatusRetired NaN Success
4321 4321 4321 US Navy LC-18A, Cape Canaveral AFS, Florida, USA Fri Dec 06, 1957 16:44 UTC Vanguard | Vanguard TV3 StatusRetired NaN Failure
4322 4322 4322 RVSN USSR Site 1/5, Baikonur Cosmodrome, Kazakhstan Sun Nov 03, 1957 02:30 UTC Sputnik 8K71PS | Sputnik-2 StatusRetired NaN Success
4323 4323 4323 RVSN USSR Site 1/5, Baikonur Cosmodrome, Kazakhstan Fri Oct 04, 1957 19:28 UTC Sputnik 8K71PS | Sputnik-1 StatusRetired NaN Success
In [13]:
clean_df = df_data.dropna()
clean_df
Out[13]:
Unnamed: 0.1 Unnamed: 0 Organisation Location Date Detail Rocket_Status Price Mission_Status
0 0 0 SpaceX LC-39A, Kennedy Space Center, Florida, USA Fri Aug 07, 2020 05:12 UTC Falcon 9 Block 5 | Starlink V1 L9 & BlackSky StatusActive 50.0 Success
1 1 1 CASC Site 9401 (SLS-2), Jiuquan Satellite Launch Ce... Thu Aug 06, 2020 04:01 UTC Long March 2D | Gaofen-9 04 & Q-SAT StatusActive 29.75 Success
3 3 3 Roscosmos Site 200/39, Baikonur Cosmodrome, Kazakhstan Thu Jul 30, 2020 21:25 UTC Proton-M/Briz-M | Ekspress-80 & Ekspress-103 StatusActive 65.0 Success
4 4 4 ULA SLC-41, Cape Canaveral AFS, Florida, USA Thu Jul 30, 2020 11:50 UTC Atlas V 541 | Perseverance StatusActive 145.0 Success
5 5 5 CASC LC-9, Taiyuan Satellite Launch Center, China Sat Jul 25, 2020 03:13 UTC Long March 4B | Ziyuan-3 03, Apocalypse-10 & N... StatusActive 64.68 Success
... ... ... ... ... ... ... ... ... ...
3855 3855 3855 US Air Force SLC-4W, Vandenberg AFB, California, USA Fri Jul 29, 1966 18:43 UTC Titan IIIB | KH-8 StatusRetired 59.0 Success
3971 3971 3971 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Thu May 06, 1965 15:00 UTC Titan IIIA | LES 2 & LCS 1 StatusRetired 63.23 Success
3993 3993 3993 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Thu Feb 11, 1965 15:19 UTC Titan IIIA | LES 1 StatusRetired 63.23 Success
4000 4000 4000 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Thu Dec 10, 1964 16:52 UTC Titan IIIA | Transtage 2 StatusRetired 63.23 Success
4020 4020 4020 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Tue Sep 01, 1964 15:00 UTC Titan IIIA | Transtage 1 StatusRetired 63.23 Failure

964 rows × 9 columns

Descriptive Statistics¶

In [14]:
clean_df.describe()
Out[14]:
Unnamed: 0.1 Unnamed: 0
count 964.00 964.00
mean 858.49 858.49
std 784.21 784.21
min 0.00 0.00
25% 324.75 324.75
50% 660.50 660.50
75% 1,112.00 1,112.00
max 4,020.00 4,020.00
In [ ]:
 

Number of Launches per Company¶

Create a chart that shows the number of space mission launches by organisation.

In [15]:
launches = clean_df.Organisation.value_counts()
launches
Out[15]:
CASC               158
NASA               149
SpaceX              99
ULA                 98
Arianespace         96
Northrop            83
ISRO                67
MHI                 37
VKS RF              33
US Air Force        26
Roscosmos           23
Kosmotras           22
ILS                 13
Eurockot            13
Rocket Lab          13
Martin Marietta      9
Lockheed             8
Boeing               7
JAXA                 3
RVSN USSR            2
Sandia               1
Virgin Orbit         1
ESA                  1
ExPace               1
EER                  1
Name: Organisation, dtype: int64
In [16]:
fig = px.pie(labels=launches.index, values=launches.values, title="Number of Launches per Company", names=launches.index)

fig.show()
C:\Users\manda\anaconda3\lib\site-packages\plotly\express\_core.py:137: FutureWarning: Support for multi-dimensional indexing (e.g. `obj[:, None]`) is deprecated and will be removed in a future version.  Convert to a numpy array before indexing instead.
  return args["labels"][column]

Number of Active versus Retired Rockets¶

How many rockets are active compared to those that are decomissioned?

In [17]:
Active_Retired_Rockets = clean_df.Rocket_Status.value_counts()
Active_Retired_Rockets
Out[17]:
StatusActive     586
StatusRetired    378
Name: Rocket_Status, dtype: int64
In [18]:
fig = px.pie(labels=Active_Retired_Rockets.index, values=Active_Retired_Rockets.values, title="Number of Active versus Retired Rockets", names=Active_Retired_Rockets.index)

fig.show()
C:\Users\manda\anaconda3\lib\site-packages\plotly\express\_core.py:137: FutureWarning:

Support for multi-dimensional indexing (e.g. `obj[:, None]`) is deprecated and will be removed in a future version.  Convert to a numpy array before indexing instead.

Distribution of Mission Status¶

How many missions were successful? How many missions failed?

In [19]:
Missions = clean_df.Mission_Status.value_counts()
Missions
Out[19]:
Success              910
Failure               36
Partial Failure       17
Prelaunch Failure      1
Name: Mission_Status, dtype: int64
In [20]:
fig = px.pie(labels=Missions.index, values=Missions.values, title="Distribution of Mission Status", names=Missions.index)

fig.show()
C:\Users\manda\anaconda3\lib\site-packages\plotly\express\_core.py:137: FutureWarning:

Support for multi-dimensional indexing (e.g. `obj[:, None]`) is deprecated and will be removed in a future version.  Convert to a numpy array before indexing instead.

How Expensive are the Launches?¶

Create a histogram and visualise the distribution. The price column is given in USD millions (careful of missing values).

In [21]:
#changing the object data to string first so comma can be replaced so data can be converted to pd numeric
clean_df.Price = clean_df.Price.astype(str).str.replace(",", "")
C:\Users\manda\AppData\Local\Temp\ipykernel_1172\38078008.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In [22]:
# 10 most expensive launches
clean_df.Price = pd.to_numeric(clean_df.Price)
clean_df.sort_values("Price", ascending=False).head(10)
C:\Users\manda\AppData\Local\Temp\ipykernel_1172\4015093597.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Out[22]:
Unnamed: 0.1 Unnamed: 0 Organisation Location Date Detail Rocket_Status Price Mission_Status
1916 1916 1916 RVSN USSR Site 110/37, Baikonur Cosmodrome, Kazakhstan Tue Nov 15, 1988 03:00 UTC Energiya/Buran | Buran StatusRetired 5,000.00 Success
2000 2000 2000 RVSN USSR Site 250, Baikonur Cosmodrome, Kazakhstan Fri May 15, 1987 17:30 UTC Energiya/Polyus | Polyus Space Station StatusRetired 5,000.00 Success
3545 3545 3545 NASA LC-39A, Kennedy Space Center, Florida, USA Wed Jul 16, 1969 13:32 UTC Saturn V | Apollo 11 StatusRetired 1,160.00 Success
3603 3603 3603 NASA LC-39A, Kennedy Space Center, Florida, USA Sat Dec 21, 1968 12:51 UTC Saturn V | Apollo 8 StatusRetired 1,160.00 Success
3475 3475 3475 NASA LC-39A, Kennedy Space Center, Florida, USA Sat Apr 11, 1970 19:13 UTC Saturn V | Apollo 13 StatusRetired 1,160.00 Success
3511 3511 3511 NASA LC-39A, Kennedy Space Center, Florida, USA Fri Nov 14, 1969 16:22 UTC Saturn V | Apollo 12 StatusRetired 1,160.00 Success
3243 3243 3243 NASA LC-39A, Kennedy Space Center, Florida, USA Sun Apr 16, 1972 17:54 UTC Saturn V | Apollo 16 StatusRetired 1,160.00 Success
3560 3560 3560 NASA LC-39B, Kennedy Space Center, Florida, USA Sun May 18, 1969 16:49 UTC Saturn V | Apollo 10 StatusRetired 1,160.00 Success
3180 3180 3180 NASA LC-39A, Kennedy Space Center, Florida, USA Tue Dec 19, 1972 19:24 UTC Saturn V | Apollo 17 StatusRetired 1,160.00 Success
3584 3584 3584 NASA LC-39A, Kennedy Space Center, Florida, USA Mon Mar 03, 1969 16:00 UTC Saturn V | Apollo 9 StatusRetired 1,160.00 Success
In [23]:
fig = px.histogram(clean_df, x="Price")
fig.show()

Use a Choropleth Map to Show the Number of Launches by Country¶

  • Create a choropleth map using the plotly documentation
  • Experiment with plotly's available colours. I quite like the sequential colour matter on this map.
  • You'll need to extract a country feature as well as change the country names that no longer exist.

Wrangle the Country Names

You'll need to use a 3 letter country code for each country. You might have to change some country names.

  • Russia is the Russian Federation
  • New Mexico should be USA
  • Yellow Sea refers to China
  • Shahrud Missile Test Site should be Iran
  • Pacific Missile Range Facility should be USA
  • Barents Sea should be Russian Federation
  • Gran Canaria should be USA

You can use the iso3166 package to convert the country names to Alpha3 format.

In [24]:
clean_df["Location"] = clean_df["Location"].astype(str)
clean_df["Country"]= clean_df['Location'].str.split(',').str[-1].str.strip()
clean_df
C:\Users\manda\AppData\Local\Temp\ipykernel_1172\3893566480.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Users\manda\AppData\Local\Temp\ipykernel_1172\3893566480.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Out[24]:
Unnamed: 0.1 Unnamed: 0 Organisation Location Date Detail Rocket_Status Price Mission_Status Country
0 0 0 SpaceX LC-39A, Kennedy Space Center, Florida, USA Fri Aug 07, 2020 05:12 UTC Falcon 9 Block 5 | Starlink V1 L9 & BlackSky StatusActive 50.00 Success USA
1 1 1 CASC Site 9401 (SLS-2), Jiuquan Satellite Launch Ce... Thu Aug 06, 2020 04:01 UTC Long March 2D | Gaofen-9 04 & Q-SAT StatusActive 29.75 Success China
3 3 3 Roscosmos Site 200/39, Baikonur Cosmodrome, Kazakhstan Thu Jul 30, 2020 21:25 UTC Proton-M/Briz-M | Ekspress-80 & Ekspress-103 StatusActive 65.00 Success Kazakhstan
4 4 4 ULA SLC-41, Cape Canaveral AFS, Florida, USA Thu Jul 30, 2020 11:50 UTC Atlas V 541 | Perseverance StatusActive 145.00 Success USA
5 5 5 CASC LC-9, Taiyuan Satellite Launch Center, China Sat Jul 25, 2020 03:13 UTC Long March 4B | Ziyuan-3 03, Apocalypse-10 & N... StatusActive 64.68 Success China
... ... ... ... ... ... ... ... ... ... ...
3855 3855 3855 US Air Force SLC-4W, Vandenberg AFB, California, USA Fri Jul 29, 1966 18:43 UTC Titan IIIB | KH-8 StatusRetired 59.00 Success USA
3971 3971 3971 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Thu May 06, 1965 15:00 UTC Titan IIIA | LES 2 & LCS 1 StatusRetired 63.23 Success USA
3993 3993 3993 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Thu Feb 11, 1965 15:19 UTC Titan IIIA | LES 1 StatusRetired 63.23 Success USA
4000 4000 4000 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Thu Dec 10, 1964 16:52 UTC Titan IIIA | Transtage 2 StatusRetired 63.23 Success USA
4020 4020 4020 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Tue Sep 01, 1964 15:00 UTC Titan IIIA | Transtage 1 StatusRetired 63.23 Failure USA

964 rows × 10 columns

In [25]:
conditions = [
    clean_df['Country'].eq('Russia'),
    clean_df['Country'].eq('New Mexico'),
    clean_df['Country'].eq('Yellow Sea'),
    clean_df['Country'].eq('Shahrud Missile Test Site'),
    clean_df['Country'].eq('Pacific Missile Range Facility'),
    clean_df['Country'].eq('Barents Sea'),
    clean_df['Country'].eq('Gran Canaria'),
]
In [26]:
choices = ['Russian Federation', 'USA', 'China', 'Iran', 'USA', 'Russian Federation', 'USA']
In [27]:
clean_df["Country"] = np.select(conditions, choices, default=clean_df["Country"])
clean_df
C:\Users\manda\AppData\Local\Temp\ipykernel_1172\128606661.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Out[27]:
Unnamed: 0.1 Unnamed: 0 Organisation Location Date Detail Rocket_Status Price Mission_Status Country
0 0 0 SpaceX LC-39A, Kennedy Space Center, Florida, USA Fri Aug 07, 2020 05:12 UTC Falcon 9 Block 5 | Starlink V1 L9 & BlackSky StatusActive 50.00 Success USA
1 1 1 CASC Site 9401 (SLS-2), Jiuquan Satellite Launch Ce... Thu Aug 06, 2020 04:01 UTC Long March 2D | Gaofen-9 04 & Q-SAT StatusActive 29.75 Success China
3 3 3 Roscosmos Site 200/39, Baikonur Cosmodrome, Kazakhstan Thu Jul 30, 2020 21:25 UTC Proton-M/Briz-M | Ekspress-80 & Ekspress-103 StatusActive 65.00 Success Kazakhstan
4 4 4 ULA SLC-41, Cape Canaveral AFS, Florida, USA Thu Jul 30, 2020 11:50 UTC Atlas V 541 | Perseverance StatusActive 145.00 Success USA
5 5 5 CASC LC-9, Taiyuan Satellite Launch Center, China Sat Jul 25, 2020 03:13 UTC Long March 4B | Ziyuan-3 03, Apocalypse-10 & N... StatusActive 64.68 Success China
... ... ... ... ... ... ... ... ... ... ...
3855 3855 3855 US Air Force SLC-4W, Vandenberg AFB, California, USA Fri Jul 29, 1966 18:43 UTC Titan IIIB | KH-8 StatusRetired 59.00 Success USA
3971 3971 3971 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Thu May 06, 1965 15:00 UTC Titan IIIA | LES 2 & LCS 1 StatusRetired 63.23 Success USA
3993 3993 3993 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Thu Feb 11, 1965 15:19 UTC Titan IIIA | LES 1 StatusRetired 63.23 Success USA
4000 4000 4000 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Thu Dec 10, 1964 16:52 UTC Titan IIIA | Transtage 2 StatusRetired 63.23 Success USA
4020 4020 4020 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Tue Sep 01, 1964 15:00 UTC Titan IIIA | Transtage 1 StatusRetired 63.23 Failure USA

964 rows × 10 columns

In [28]:
cc = coco.CountryConverter()
In [29]:
clean_df["ISO"] = cc.pandas_convert(series=clean_df["Country"], to='ISO3')
clean_df
C:\Users\manda\AppData\Local\Temp\ipykernel_1172\3867868719.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Out[29]:
Unnamed: 0.1 Unnamed: 0 Organisation Location Date Detail Rocket_Status Price Mission_Status Country ISO
0 0 0 SpaceX LC-39A, Kennedy Space Center, Florida, USA Fri Aug 07, 2020 05:12 UTC Falcon 9 Block 5 | Starlink V1 L9 & BlackSky StatusActive 50.00 Success USA USA
1 1 1 CASC Site 9401 (SLS-2), Jiuquan Satellite Launch Ce... Thu Aug 06, 2020 04:01 UTC Long March 2D | Gaofen-9 04 & Q-SAT StatusActive 29.75 Success China CHN
3 3 3 Roscosmos Site 200/39, Baikonur Cosmodrome, Kazakhstan Thu Jul 30, 2020 21:25 UTC Proton-M/Briz-M | Ekspress-80 & Ekspress-103 StatusActive 65.00 Success Kazakhstan KAZ
4 4 4 ULA SLC-41, Cape Canaveral AFS, Florida, USA Thu Jul 30, 2020 11:50 UTC Atlas V 541 | Perseverance StatusActive 145.00 Success USA USA
5 5 5 CASC LC-9, Taiyuan Satellite Launch Center, China Sat Jul 25, 2020 03:13 UTC Long March 4B | Ziyuan-3 03, Apocalypse-10 & N... StatusActive 64.68 Success China CHN
... ... ... ... ... ... ... ... ... ... ... ...
3855 3855 3855 US Air Force SLC-4W, Vandenberg AFB, California, USA Fri Jul 29, 1966 18:43 UTC Titan IIIB | KH-8 StatusRetired 59.00 Success USA USA
3971 3971 3971 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Thu May 06, 1965 15:00 UTC Titan IIIA | LES 2 & LCS 1 StatusRetired 63.23 Success USA USA
3993 3993 3993 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Thu Feb 11, 1965 15:19 UTC Titan IIIA | LES 1 StatusRetired 63.23 Success USA USA
4000 4000 4000 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Thu Dec 10, 1964 16:52 UTC Titan IIIA | Transtage 2 StatusRetired 63.23 Success USA USA
4020 4020 4020 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Tue Sep 01, 1964 15:00 UTC Titan IIIA | Transtage 1 StatusRetired 63.23 Failure USA USA

964 rows × 11 columns

In [30]:
counted_countries = clean_df.groupby(["Country", "ISO"], as_index=False).agg({"Mission_Status": pd.Series.count})
counted_countries
Out[30]:
Country ISO Mission_Status
0 China CHN 159
1 France FRA 95
2 India IND 67
3 Japan JPN 40
4 Kazakhstan KAZ 46
5 New Zealand NZL 13
6 Russian Federation RUS 54
7 USA USA 490
In [31]:
mission_map = px.choropleth(counted_countries, locations="ISO", color="Mission_Status", hover_name="Country", color_continuous_scale=px.colors.sequential.matter)
mission_map.update_layout(coloraxis_showscale=True)
mission_map.show()
In [32]:
failure = clean_df[clean_df["Mission_Status"] == "Failure"]
failure
Out[32]:
Unnamed: 0.1 Unnamed: 0 Organisation Location Date Detail Rocket_Status Price Mission_Status Country ISO
11 11 11 ExPace Site 95, Jiuquan Satellite Launch Center, China Fri Jul 10, 2020 04:17 UTC Kuaizhou 11 | Jilin-1 02E, CentiSpace-1 S2 StatusActive 28.30 Failure China CHN
15 15 15 Rocket Lab Rocket Lab LC-1A, M?hia Peninsula, New Zealand Sat Jul 04, 2020 21:19 UTC Electron/Curie | Pics Or It Didn??¦t Happen StatusActive 7.50 Failure New Zealand NZL
27 27 27 Virgin Orbit Cosmic Girl, Mojave Air and Space Port, Califo... Mon May 25, 2020 19:50 UTC LauncherOne | Demo Flight StatusActive 12.00 Failure USA USA
36 36 36 CASC LC-2, Xichang Satellite Launch Center, China Thu Apr 09, 2020 11:46 UTC Long March 3B/E | Nusantara Dua StatusActive 29.15 Failure China CHN
124 124 124 Arianespace ELV-1 (SLV), Guiana Space Centre, French Guian... Thu Jul 11, 2019 01:53 UTC Vega | Falcon Eye 1 StatusActive 37.00 Failure France FRA
137 137 137 CASC LC-9, Taiyuan Satellite Launch Center, China Wed May 22, 2019 22:55 UTC Long March 4C | Yaogan Weixing-33 StatusActive 64.68 Failure China CHN
324 324 324 ISRO First Launch Pad, Satish Dhawan Space Centre, ... Thu Aug 31, 2017 13:30 UTC PSLV-XL | IRNSS-1H StatusActive 31.00 Failure India IND
353 353 353 Rocket Lab Rocket Lab LC-1A, M?hia Peninsula, New Zealand Thu May 25, 2017 04:20 UTC Electron | It's a Test StatusActive 7.50 Failure New Zealand NZL
414 414 414 CASC LC-9, Taiyuan Satellite Launch Center, China Wed Aug 31, 2016 18:50 UTC Long March 4C | Gaofen-10 StatusActive 64.68 Failure China CHN
481 481 481 Sandia LP-41, Kauai, Pacific Missile Range Facility Wed Nov 04, 2015 Super Stripy | HiakaSat, STACEM & Others StatusActive 15.00 Failure USA USA
499 499 499 SpaceX SLC-40, Cape Canaveral AFS, Florida, USA Sun Jun 28, 2015 14:21 UTC Falcon 9 v1.1 | CRS-7 StatusRetired 56.50 Failure USA USA
534 534 534 Northrop LP-0A, Wallops Flight Facility, Virginia, USA Tue Oct 28, 2014 22:22 UTC Antares 130 | CRS Orb-3 StatusRetired 80.00 Failure USA USA
601 601 601 VKS RF Site 81/24, Baikonur Cosmodrome, Kazakhstan Tue Jul 02, 2013 02:38 UTC Proton-M/DM-3 | Cosmos 2488, 2489 & 2490 StatusActive 65.00 Failure Kazakhstan KAZ
694 694 694 Northrop SLC-576E, Vandenberg AFB, California, USA Fri Mar 04, 2011 10:09 UTC Minotaur C (Taurus) | Glory, KySat-1, Hermes, ... StatusActive 45.00 Failure USA USA
703 703 703 ISRO Second Launch Pad, Satish Dhawan Space Centre,... Sat Dec 25, 2010 10:34 UTC GSLV Mk I | GSAT-5P StatusRetired 47.00 Failure India IND
731 731 731 ISRO Second Launch Pad, Satish Dhawan Space Centre,... Thu Apr 15, 2010 10:57 UTC GSLV Mk II | GSAT-4 StatusActive 47.00 Failure India IND
782 782 782 Northrop SLC-576E, Vandenberg AFB, California, USA Tue Feb 24, 2009 09:55 UTC Minotaur C (Taurus) | Orbiting Carbon Observatory StatusActive 45.00 Failure USA USA
808 808 808 SpaceX Omelek Island, Ronald Reagan Ballistic Missile... Sun Aug 03, 2008 03:34 UTC Falcon 1 | Flight 3 StatusRetired 7.00 Failure USA USA
879 879 879 SpaceX Omelek Island, Ronald Reagan Ballistic Missile... Wed Mar 21, 2007 01:10 UTC Falcon 1 | DemoSat StatusRetired 7.00 Failure USA USA
910 910 910 Kosmotras Site 109/95, Baikonur Cosmodrome, Kazakhstan Wed Jul 26, 2006 19:43 UTC Dnepr | BelKa 1 & Others StatusRetired 29.00 Failure Kazakhstan KAZ
913 913 913 ISRO Second Launch Pad, Satish Dhawan Space Centre,... Mon Jul 10, 2006 12:08 UTC GSLV Mk I | INSAT-4C StatusRetired 47.00 Failure India IND
929 929 929 SpaceX Omelek Island, Ronald Reagan Ballistic Missile... Fri Mar 24, 2006 21:30 UTC Falcon 1 | FalconSat-2 StatusRetired 7.00 Failure USA USA
944 944 944 Eurockot Site 133/3, Plesetsk Cosmodrome, Russia Sat Oct 08, 2005 15:02 UTC Rokot/Briz KM | CryoSat-1 StatusRetired 41.80 Failure Russian Federation RUS
1062 1062 1062 NASA LC-39A, Kennedy Space Center, Florida, USA Thu Jan 16, 2003 15:39 UTC Space Shuttle Columbia | STS-107 StatusRetired 450.00 Failure USA USA
1070 1070 1070 Arianespace ELA-3, Guiana Space Centre, French Guiana, France Wed Dec 11, 2002 22:22 UTC Ariane 5 ECA | Hot Bird 7, Stentor, MFD-A, MFD-B StatusActive 200.00 Failure France FRA
1127 1127 1127 Northrop SLC-576E, Vandenberg AFB, California, USA Fri Sep 21, 2001 18:49 UTC Minotaur C (Taurus) | Orbview-4/QuikTOMS StatusActive 45.00 Failure USA USA
1418 1418 1418 Northrop Stargazer, Vandenberg AFB, California, USA Mon Nov 04, 1996 17:08 UTC Pegasus XL | HETE & SAC-B StatusActive 40.00 Failure USA USA
1483 1483 1483 EER LP-0A, Wallops Flight Facility, Virginia, USA Mon Oct 23, 1995 22:03 UTC Conestoga-1620 | METEOR StatusRetired 20.00 Failure USA USA
1504 1504 1504 Northrop Stargazer, Vandenberg AFB, California, USA Thu Jun 22, 1995 19:58 UTC Pegasus XL | STEP-3 StatusActive 40.00 Failure USA USA
1570 1570 1570 Northrop Stargazer, Vandenberg AFB, California, USA Mon Jun 27, 1994 21:15 UTC Pegasus XL | STEP-1 StatusActive 40.00 Failure USA USA
1607 1607 1607 Martin Marietta SLC-4W, Vandenberg AFB, California, USA Tue Oct 05, 1993 17:56 UTC Titan II(23)G | Landsat 6 StatusRetired 35.00 Failure USA USA
1609 1609 1609 ISRO First Launch Pad, Satish Dhawan Space Centre, ... Mon Sep 20, 1993 05:12 UTC PSLV-G | IRS-P1 StatusRetired 25.00 Failure India IND
1837 1837 1837 Martin Marietta SLC-40, Cape Canaveral AFS, Florida, USA Wed Mar 14, 1990 11:52 UTC Commercial Titan III | Intelsat 603 StatusRetired 136.60 Failure USA USA
2079 2079 2079 NASA LC-39A, Kennedy Space Center, Florida, USA Tue Jan 28, 1986 16:38 UTC Space Shuttle Challenger | STS-51-L StatusRetired 450.00 Failure USA USA
3779 3779 3779 US Air Force SLC-4W, Vandenberg AFB, California, USA Wed Apr 26, 1967 Titan IIIB | OPS 4243 StatusRetired 59.00 Failure USA USA
4020 4020 4020 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA Tue Sep 01, 1964 15:00 UTC Titan IIIA | Transtage 1 StatusRetired 63.23 Failure USA USA
In [33]:
#failure by countries
df_countries = failure.groupby(["Country", "ISO"], as_index=False).agg({"Mission_Status": pd.Series.count})
df_countries = df_countries.sort_values(by="Mission_Status", ascending=False)
df_countries
Out[33]:
Country ISO Mission_Status
6 USA USA 20
2 India IND 5
0 China CHN 4
1 France FRA 2
3 Kazakhstan KAZ 2
4 New Zealand NZL 2
5 Russian Federation RUS 1

Use a Choropleth Map to Show the Number of Failures by Country¶

In [34]:
failure_map = px.choropleth(df_countries, locations="ISO", color="Mission_Status", hover_name="Country", color_continuous_scale=px.colors.sequential.matter)
failure_map.update_layout(coloraxis_showscale=True)
failure_map.show()

Create a Plotly Sunburst Chart of the countries, organisations, and mission status.¶

In [35]:
org = clean_df.groupby(by=["Country", "Organisation"], as_index=False).agg({"Mission_Status": pd.Series.count})
org.sort_values("Mission_Status", ascending=False)
Out[35]:
Country Organisation Mission_Status
0 China CASC 158
23 USA NASA 149
26 USA SpaceX 99
27 USA ULA 98
2 France Arianespace 94
24 USA Northrop 83
4 India ISRO 67
6 Japan MHI 37
17 Russian Federation VKS RF 28
28 USA US Air Force 26
11 Kazakhstan Roscosmos 20
14 Russian Federation Eurockot 13
13 New Zealand Rocket Lab 13
9 Kazakhstan Kosmotras 12
15 Russian Federation Kosmotras 10
22 USA Martin Marietta 9
20 USA ILS 8
21 USA Lockheed 8
18 USA Boeing 7
12 Kazakhstan VKS RF 5
8 Kazakhstan ILS 5
16 Russian Federation Roscosmos 3
5 Japan JAXA 3
10 Kazakhstan RVSN USSR 2
7 Kazakhstan Arianespace 2
1 China ExPace 1
19 USA EER 1
25 USA Sandia 1
3 France ESA 1
29 USA Virgin Orbit 1
In [36]:
burst = px.sunburst(org, path=["Country", "Organisation"], values="Mission_Status", title="Where do launches take place?")
burst.update_layout(xaxis_title="Number of Prize", yaxis_title="Organization", coloraxis_showscale=False)
burst.show()

Analyse the Total Amount of Money Spent by Organisation on Space Missions¶

In [37]:
total_amount = clean_df.groupby("Organisation", as_index=False).agg({"Price": pd.Series.sum})
total_amount.sort_values("Price", ascending=False)
Out[37]:
Organisation Price
14 NASA 76,280.00
0 Arianespace 16,345.00
21 ULA 14,798.00
16 RVSN USSR 10,000.00
2 CASC 6,340.26
20 SpaceX 5,444.00
15 Northrop 3,930.00
12 MHI 3,532.50
8 ISRO 2,177.00
22 US Air Force 1,550.92
23 VKS RF 1,548.90
7 ILS 1,320.00
1 Boeing 1,241.00
18 Roscosmos 1,187.50
13 Martin Marietta 721.40
10 Kosmotras 638.00
5 Eurockot 543.40
11 Lockheed 280.00
9 JAXA 168.00
17 Rocket Lab 97.50
4 ESA 37.00
6 ExPace 28.30
3 EER 20.00
19 Sandia 15.00
24 Virgin Orbit 12.00
In [ ]:
 
In [ ]:
 

Analyse the Amount of Money Spent by Organisation per Launch¶

In [38]:
money_spent_per_launch = clean_df.groupby(by=["Organisation","Detail"], as_index=False).agg({"Price": pd.Series.sum})
money_spent_per_launch
Out[38]:
Organisation Detail Price
0 Arianespace Ariane 5 ECA | ABS-2, Athena-Fidus 200.00
1 Arianespace Ariane 5 ECA | Alphasat I-XL, INSAT-3D 200.00
2 Arianespace Ariane 5 ECA | Amazonas 2 & COMSATBw-1 200.00
3 Arianespace Ariane 5 ECA | Amazonas-3, Azerspace-1 (Africa... 200.00
4 Arianespace Ariane 5 ECA | Arabsat 6B, GSAT-15 200.00
... ... ... ...
957 VKS RF Soyuz 2.1b/Fregat | Cosmos 2544 48.50
958 VKS RF Soyuz 2.1b/Fregat | GLONASS-M No.50S 48.50
959 VKS RF Soyuz 2.1b/Fregat | GLONASS-M No.51S 48.50
960 VKS RF Soyuz 2.1b/Fregat | GLONASS-M No.54S 48.50
961 Virgin Orbit LauncherOne | Demo Flight 12.00

962 rows × 3 columns

In [ ]:
 
In [ ]:
 

Chart the Number of Launches per Year¶

In [39]:
clean_df['Date'] = pd.to_datetime(clean_df["Date"], utc=True)
clean_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 964 entries, 0 to 4020
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype              
---  ------          --------------  -----              
 0   Unnamed: 0.1    964 non-null    int64              
 1   Unnamed: 0      964 non-null    int64              
 2   Organisation    964 non-null    object             
 3   Location        964 non-null    object             
 4   Date            964 non-null    datetime64[ns, UTC]
 5   Detail          964 non-null    object             
 6   Rocket_Status   964 non-null    object             
 7   Price           964 non-null    float64            
 8   Mission_Status  964 non-null    object             
 9   Country         964 non-null    object             
 10  ISO             964 non-null    object             
dtypes: datetime64[ns, UTC](1), float64(1), int64(2), object(7)
memory usage: 90.4+ KB
C:\Users\manda\AppData\Local\Temp\ipykernel_1172\4094332867.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In [40]:
clean_df['Year'] = clean_df["Date"].dt.year
clean_df
C:\Users\manda\AppData\Local\Temp\ipykernel_1172\2272252195.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Out[40]:
Unnamed: 0.1 Unnamed: 0 Organisation Location Date Detail Rocket_Status Price Mission_Status Country ISO Year
0 0 0 SpaceX LC-39A, Kennedy Space Center, Florida, USA 2020-08-07 05:12:00+00:00 Falcon 9 Block 5 | Starlink V1 L9 & BlackSky StatusActive 50.00 Success USA USA 2020
1 1 1 CASC Site 9401 (SLS-2), Jiuquan Satellite Launch Ce... 2020-08-06 04:01:00+00:00 Long March 2D | Gaofen-9 04 & Q-SAT StatusActive 29.75 Success China CHN 2020
3 3 3 Roscosmos Site 200/39, Baikonur Cosmodrome, Kazakhstan 2020-07-30 21:25:00+00:00 Proton-M/Briz-M | Ekspress-80 & Ekspress-103 StatusActive 65.00 Success Kazakhstan KAZ 2020
4 4 4 ULA SLC-41, Cape Canaveral AFS, Florida, USA 2020-07-30 11:50:00+00:00 Atlas V 541 | Perseverance StatusActive 145.00 Success USA USA 2020
5 5 5 CASC LC-9, Taiyuan Satellite Launch Center, China 2020-07-25 03:13:00+00:00 Long March 4B | Ziyuan-3 03, Apocalypse-10 & N... StatusActive 64.68 Success China CHN 2020
... ... ... ... ... ... ... ... ... ... ... ... ...
3855 3855 3855 US Air Force SLC-4W, Vandenberg AFB, California, USA 1966-07-29 18:43:00+00:00 Titan IIIB | KH-8 StatusRetired 59.00 Success USA USA 1966
3971 3971 3971 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA 1965-05-06 15:00:00+00:00 Titan IIIA | LES 2 & LCS 1 StatusRetired 63.23 Success USA USA 1965
3993 3993 3993 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA 1965-02-11 15:19:00+00:00 Titan IIIA | LES 1 StatusRetired 63.23 Success USA USA 1965
4000 4000 4000 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA 1964-12-10 16:52:00+00:00 Titan IIIA | Transtage 2 StatusRetired 63.23 Success USA USA 1964
4020 4020 4020 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA 1964-09-01 15:00:00+00:00 Titan IIIA | Transtage 1 StatusRetired 63.23 Failure USA USA 1964

964 rows × 12 columns

In [41]:
per_year = clean_df["Year"].value_counts()
per_year
Out[41]:
2018    88
2019    73
2017    66
2016    64
2020    52
2014    41
2015    39
2013    33
2009    31
2011    29
2010    29
2007    29
2008    28
2006    27
2012    25
2003    20
1998    18
1997    18
2004    17
2002    17
1994    16
2000    16
2005    15
1999    14
1996    14
1993    12
1992    12
1990    11
1968    10
1985    10
1995    10
2001    10
1969     8
1967     8
1991     7
1989     6
1984     6
1983     5
1988     5
1982     4
1987     3
1986     3
1966     3
1965     2
1964     2
1971     2
1972     2
1981     2
1970     1
1973     1
Name: Year, dtype: int64
In [42]:
bar = px.bar(x = per_year.index, y = per_year, title="Number Launches per Year", hover_name=per_year.index, color=per_year.values, color_continuous_scale='Agsunset')
bar.update_layout(xaxis_title='Year', yaxis_title='Number of Launches', coloraxis_showscale=False)
bar.show()

Chart the Number of Launches Month-on-Month until the Present¶

Which month has seen the highest number of launches in all time? Superimpose a rolling average on the month on month time series chart.

In [43]:
clean_df['Month'] = clean_df["Date"].dt.month
clean_df
C:\Users\manda\AppData\Local\Temp\ipykernel_1172\1550588895.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Out[43]:
Unnamed: 0.1 Unnamed: 0 Organisation Location Date Detail Rocket_Status Price Mission_Status Country ISO Year Month
0 0 0 SpaceX LC-39A, Kennedy Space Center, Florida, USA 2020-08-07 05:12:00+00:00 Falcon 9 Block 5 | Starlink V1 L9 & BlackSky StatusActive 50.00 Success USA USA 2020 8
1 1 1 CASC Site 9401 (SLS-2), Jiuquan Satellite Launch Ce... 2020-08-06 04:01:00+00:00 Long March 2D | Gaofen-9 04 & Q-SAT StatusActive 29.75 Success China CHN 2020 8
3 3 3 Roscosmos Site 200/39, Baikonur Cosmodrome, Kazakhstan 2020-07-30 21:25:00+00:00 Proton-M/Briz-M | Ekspress-80 & Ekspress-103 StatusActive 65.00 Success Kazakhstan KAZ 2020 7
4 4 4 ULA SLC-41, Cape Canaveral AFS, Florida, USA 2020-07-30 11:50:00+00:00 Atlas V 541 | Perseverance StatusActive 145.00 Success USA USA 2020 7
5 5 5 CASC LC-9, Taiyuan Satellite Launch Center, China 2020-07-25 03:13:00+00:00 Long March 4B | Ziyuan-3 03, Apocalypse-10 & N... StatusActive 64.68 Success China CHN 2020 7
... ... ... ... ... ... ... ... ... ... ... ... ... ...
3855 3855 3855 US Air Force SLC-4W, Vandenberg AFB, California, USA 1966-07-29 18:43:00+00:00 Titan IIIB | KH-8 StatusRetired 59.00 Success USA USA 1966 7
3971 3971 3971 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA 1965-05-06 15:00:00+00:00 Titan IIIA | LES 2 & LCS 1 StatusRetired 63.23 Success USA USA 1965 5
3993 3993 3993 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA 1965-02-11 15:19:00+00:00 Titan IIIA | LES 1 StatusRetired 63.23 Success USA USA 1965 2
4000 4000 4000 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA 1964-12-10 16:52:00+00:00 Titan IIIA | Transtage 2 StatusRetired 63.23 Success USA USA 1964 12
4020 4020 4020 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA 1964-09-01 15:00:00+00:00 Titan IIIA | Transtage 1 StatusRetired 63.23 Failure USA USA 1964 9

964 rows × 13 columns

In [44]:
month_to_month = clean_df["Month"].value_counts()
month_to_month
Out[44]:
12    92
4     91
9     88
5     86
6     85
10    83
8     82
7     77
3     77
11    77
1     66
2     60
Name: Month, dtype: int64
In [45]:
bar = px.bar(x = month_to_month.index, y = month_to_month, title="Number Launches Month to Month", hover_name=month_to_month.index, color=month_to_month.values, color_continuous_scale='Agsunset')
bar.update_layout(xaxis_title='Month', yaxis_title='Number of Launches', coloraxis_showscale=False)
bar.show()

Launches per Month: Which months are most popular and least popular for launches?¶

Some months have better weather than others. Which time of year seems to be best for space missions?

In [46]:
max_month = month_to_month.idxmax()
max_month
Out[46]:
12
In [47]:
min_month = month_to_month.idxmin()
min_month
Out[47]:
2

Month that is most popular for rocket launches : December Month that is least popular for rocket launches : February

How has the Launch Price varied Over Time?¶

Create a line chart that shows the average price of rocket launches over time.

In [48]:
Price_Over_Time = clean_df.groupby("Year", as_index=False).agg({"Price": pd.Series.mean})
Price_Over_Time.sort_values("Price", ascending=False)
Out[48]:
Year Price
16 1987 1,687.20
17 1988 1,193.16
9 1973 1,160.00
6 1970 1,160.00
7 1971 1,160.00
8 1972 1,160.00
5 1969 609.50
10 1981 450.00
14 1985 408.08
20 1991 391.43
18 1989 380.83
13 1984 380.13
12 1983 366.16
11 1982 345.20
24 1995 325.00
21 1992 319.35
15 1986 310.27
30 2001 290.70
19 1990 289.15
4 1968 279.20
22 1993 276.73
25 1996 243.20
26 1997 221.74
23 1994 221.51
3 1967 196.62
31 2002 185.59
38 2009 180.29
29 2000 173.12
27 1998 152.12
39 2010 148.40
40 2011 146.60
35 2006 138.61
37 2008 129.75
28 1999 128.51
36 2007 125.70
41 2012 122.29
43 2014 102.55
34 2005 95.79
33 2004 92.74
44 2015 91.55
42 2013 90.55
32 2003 80.51
45 2016 79.46
46 2017 69.49
47 2018 64.75
0 1964 63.23
1 1965 63.23
48 2019 59.61
2 1966 59.00
49 2020 56.65
In [49]:
l_chart = px.line(Price_Over_Time, x="Year", y="Price")

l_chart.update_layout(xaxis_title="Year", yaxis_title="The Average Price of Rocket Launches Over Time")
l_chart.show()

Chart the Number of Launches over Time by the Top 10 Organisations.¶

How has the dominance of launches changed over time between the different players?

In [50]:
top_ten = clean_df.groupby(by=["Organisation", "Year"], as_index=False).agg({"Mission_Status": pd.Series.count})
top_ten = top_ten.sort_values(by="Mission_Status", ascending=False)[:52]
top_ten["Organisation"].nunique()
Out[50]:
10
In [51]:
top_ten
Out[51]:
Organisation Year Mission_Status
48 CASC 2018 27
229 SpaceX 2018 21
49 CASC 2019 21
228 SpaceX 2017 18
50 CASC 2020 16
46 CASC 2016 15
230 SpaceX 2019 13
231 SpaceX 2020 13
239 ULA 2014 13
47 CASC 2017 12
240 ULA 2015 10
241 ULA 2016 10
12 Arianespace 2015 9
152 NASA 1985 9
227 SpaceX 2016 9
238 ULA 2013 9
184 Northrop 1998 8
215 Roscosmos 2019 8
158 NASA 1992 8
163 NASA 1997 8
13 Arianespace 2016 8
237 ULA 2012 8
250 US Air Force 1968 8
43 CASC 2007 8
40 CASC 2004 8
236 ULA 2011 8
14 Arianespace 2017 8
94 ISRO 2018 7
9 Arianespace 2012 7
159 NASA 1993 7
160 NASA 1994 7
161 NASA 1995 7
234 ULA 2009 7
235 ULA 2010 7
92 ISRO 2016 7
15 Arianespace 2018 7
249 US Air Force 1967 7
162 NASA 1996 7
226 SpaceX 2015 7
11 Arianespace 2014 6
6 Arianespace 2009 6
7 Arianespace 2010 6
167 NASA 2001 6
242 ULA 2017 6
156 NASA 1990 6
243 ULA 2018 6
157 NASA 1991 6
44 CASC 2008 6
16 Arianespace 2019 6
225 SpaceX 2014 6
174 NASA 2009 6
210 Rocket Lab 2019 6
In [52]:
topten_chart = px.line(top_ten, x="Year", y="Mission_Status", color="Organisation", hover_name="Organisation")

topten_chart.update_layout(xaxis_title="Year", yaxis_title="Number of Launches")
topten_chart.show()
In [ ]:
 
In [ ]:
 

Cold War Space Race: USA vs USSR¶

The cold war lasted from the start of the dataset up until 1991.

In [53]:
cold_war_year = clean_df[clean_df["Year"] < 1992]
cold_war_year["Country"]
Out[53]:
1724    USA
1732    USA
1741    USA
1743    USA
1750    USA
       ... 
3855    USA
3971    USA
3993    USA
4000    USA
4020    USA
Name: Country, Length: 101, dtype: object
In [54]:
options = ['KAZ', 'USA']
In [55]:
cold_war = cold_war_year[cold_war_year['ISO'].isin(options)]
cold_war_year.groupby(by=["ISO"], as_index=False).count()
Out[55]:
ISO Unnamed: 0.1 Unnamed: 0 Organisation Location Date Detail Rocket_Status Price Mission_Status Country Year Month
0 CHN 9 9 9 9 9 9 9 9 9 9 9 9
1 KAZ 2 2 2 2 2 2 2 2 2 2 2 2
2 USA 90 90 90 90 90 90 90 90 90 90 90 90
In [56]:
cold_war
Out[56]:
Unnamed: 0.1 Unnamed: 0 Organisation Location Date Detail Rocket_Status Price Mission_Status Country ISO Year Month
1724 1724 1724 NASA LC-39A, Kennedy Space Center, Florida, USA 1991-11-24 23:44:00+00:00 Space Shuttle Atlantis | STS-44 StatusRetired 450.00 Success USA USA 1991 11
1732 1732 1732 NASA LC-39A, Kennedy Space Center, Florida, USA 1991-09-12 23:11:00+00:00 Space Shuttle Discovery | STS-48 StatusRetired 450.00 Success USA USA 1991 9
1741 1741 1741 NASA LC-39A, Kennedy Space Center, Florida, USA 1991-08-02 15:02:00+00:00 Space Shuttle Atlantis | STS-43 StatusRetired 450.00 Success USA USA 1991 8
1743 1743 1743 Northrop NB-52B Carrier, Edwards AFB, California, USA 1991-07-21 17:33:00+00:00 Pegasus/HAPS | 7 Microsats StatusRetired 40.00 Partial Failure USA USA 1991 7
1750 1750 1750 NASA LC-39B, Kennedy Space Center, Florida, USA 1991-06-05 13:24:00+00:00 Space Shuttle Columbia | STS-40 StatusRetired 450.00 Success USA USA 1991 6
... ... ... ... ... ... ... ... ... ... ... ... ... ...
3855 3855 3855 US Air Force SLC-4W, Vandenberg AFB, California, USA 1966-07-29 18:43:00+00:00 Titan IIIB | KH-8 StatusRetired 59.00 Success USA USA 1966 7
3971 3971 3971 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA 1965-05-06 15:00:00+00:00 Titan IIIA | LES 2 & LCS 1 StatusRetired 63.23 Success USA USA 1965 5
3993 3993 3993 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA 1965-02-11 15:19:00+00:00 Titan IIIA | LES 1 StatusRetired 63.23 Success USA USA 1965 2
4000 4000 4000 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA 1964-12-10 16:52:00+00:00 Titan IIIA | Transtage 2 StatusRetired 63.23 Success USA USA 1964 12
4020 4020 4020 US Air Force SLC-20, Cape Canaveral AFS, Florida, USA 1964-09-01 15:00:00+00:00 Titan IIIA | Transtage 1 StatusRetired 63.23 Failure USA USA 1964 9

92 rows × 13 columns

In [57]:
cold_war_group = cold_war["ISO"].value_counts()
cold_war_group
Out[57]:
USA    90
KAZ     2
Name: ISO, dtype: int64

Create a Plotly Pie Chart comparing the total number of launches of the USSR and the USA¶

Hint: Remember to include former Soviet Republics like Kazakhstan when analysing the total number of launches.

In [58]:
cold_war_chart = px.pie(labels=cold_war_group.index, values=cold_war_group.values, title="Total Number of Launches of The USSR and The USA", names=cold_war_group.index)

cold_war_chart.show()
C:\Users\manda\anaconda3\lib\site-packages\plotly\express\_core.py:137: FutureWarning:

Support for multi-dimensional indexing (e.g. `obj[:, None]`) is deprecated and will be removed in a future version.  Convert to a numpy array before indexing instead.

In [ ]:
 

Create a Chart that Shows the Total Number of Launches Year-On-Year by the Two Superpowers¶

In [59]:
superpower = ['CASC', 'NASA']
In [60]:
superpower_df = clean_df[clean_df['Organisation'].isin(superpower)]
superpower_df.head()
Out[60]:
Unnamed: 0.1 Unnamed: 0 Organisation Location Date Detail Rocket_Status Price Mission_Status Country ISO Year Month
1 1 1 CASC Site 9401 (SLS-2), Jiuquan Satellite Launch Ce... 2020-08-06 04:01:00+00:00 Long March 2D | Gaofen-9 04 & Q-SAT StatusActive 29.75 Success China CHN 2020 8
5 5 5 CASC LC-9, Taiyuan Satellite Launch Center, China 2020-07-25 03:13:00+00:00 Long March 4B | Ziyuan-3 03, Apocalypse-10 & N... StatusActive 64.68 Success China CHN 2020 7
12 12 12 CASC LC-3, Xichang Satellite Launch Center, China 2020-07-09 12:11:00+00:00 Long March 3B/E | Apstar-6D StatusActive 29.15 Success China CHN 2020 7
14 14 14 CASC Site 9401 (SLS-2), Jiuquan Satellite Launch Ce... 2020-07-04 23:44:00+00:00 Long March 2D | Shiyan-6 02 StatusActive 29.75 Success China CHN 2020 7
16 16 16 CASC LC-9, Taiyuan Satellite Launch Center, China 2020-07-03 03:10:00+00:00 Long March 4B | Gaofen Duomo & BY-02 StatusActive 64.68 Success China CHN 2020 7
In [61]:
spw = superpower_df.groupby(by=["Organisation", "Year"], as_index=False).agg({"Mission_Status": pd.Series.count})
spw = spw.sort_values(by="Mission_Status", ascending=False)
spw
Out[61]:
Organisation Year Mission_Status
26 CASC 2018 27
27 CASC 2019 21
28 CASC 2020 16
24 CASC 2016 15
25 CASC 2017 12
... ... ... ...
6 CASC 1988 1
4 CASC 1986 1
3 CASC 1985 1
2 CASC 1984 1
32 NASA 1970 1

65 rows × 3 columns

In [62]:
spw_chart = px.line(spw, x="Mission_Status", y="Year", color="Organisation", hover_name="Organisation")

spw_chart.update_layout(xaxis_title="Number of Launches", yaxis_title="Year")
spw_chart.show()
In [63]:
failure = clean_df[clean_df["Mission_Status"] == "Failure"]
success = clean_df[clean_df["Mission_Status"] == "Success"]

Chart the Total Number of Mission Failures Year on Year.¶

In [64]:
failure_year = failure.groupby(by=["Year"], as_index=False).agg({"Mission_Status": pd.Series.count})
failure_year
Out[64]:
Year Mission_Status
0 1964 1
1 1967 1
2 1986 1
3 1990 1
4 1993 2
5 1994 1
6 1995 2
7 1996 1
8 2001 1
9 2002 1
10 2003 1
11 2005 1
12 2006 3
13 2007 1
14 2008 1
15 2009 1
16 2010 2
17 2011 1
18 2013 1
19 2014 1
20 2015 2
21 2016 1
22 2017 2
23 2019 2
24 2020 4
In [65]:
fail_year = px.line(failure_year, x="Year", y="Mission_Status")

fail_year.update_layout(xaxis_title="Year", yaxis_title="Total Number of Mission Failures Year on Year")
fail_year.show()

Chart the Percentage of Failures over Time¶

Did failures go up or down over time? Did the countries get better at minimising risk and improving their chances of success over time?

In [66]:
overtime_fail = failure.groupby(["Year"], as_index=False).agg({"Mission_Status": pd.Series.count})
overtime_fail
Out[66]:
Year Mission_Status
0 1964 1
1 1967 1
2 1986 1
3 1990 1
4 1993 2
5 1994 1
6 1995 2
7 1996 1
8 2001 1
9 2002 1
10 2003 1
11 2005 1
12 2006 3
13 2007 1
14 2008 1
15 2009 1
16 2010 2
17 2011 1
18 2013 1
19 2014 1
20 2015 2
21 2016 1
22 2017 2
23 2019 2
24 2020 4
In [67]:
fail_chart = px.pie(labels=overtime_fail.Year, values=overtime_fail.Mission_Status, title="Failure Percentage by Year", names=overtime_fail.Year)

fail_chart.update_traces(textposition="outside", textinfo="percent+label")

fail_chart.show()
In [68]:
clean_df.Mission_Status.value_counts()
Out[68]:
Success              910
Failure               36
Partial Failure       17
Prelaunch Failure      1
Name: Mission_Status, dtype: int64
In [69]:
overtime_success = success.groupby(["Year"], as_index=False).agg({"Mission_Status": pd.Series.count})
overtime_success
Out[69]:
Year Mission_Status
0 1964 1
1 1965 2
2 1966 3
3 1967 6
4 1968 9
5 1969 8
6 1970 1
7 1971 2
8 1972 2
9 1973 1
10 1981 2
11 1982 4
12 1983 5
13 1984 6
14 1985 10
15 1986 2
16 1987 3
17 1988 5
18 1989 6
19 1990 10
20 1991 6
21 1992 12
22 1993 10
23 1994 13
24 1995 8
25 1996 13
26 1997 17
27 1998 18
28 1999 14
29 2000 16
30 2001 8
31 2002 16
32 2003 19
33 2004 16
34 2005 14
35 2006 24
36 2007 26
37 2008 27
38 2009 30
39 2010 27
40 2011 27
41 2012 24
42 2013 31
43 2014 40
44 2015 37
45 2016 61
46 2017 62
47 2018 87
48 2019 71
49 2020 48
In [70]:
success_chart = px.pie(labels=overtime_success.Year, values=overtime_success.Mission_Status, title="Success Percentage by Year", names=overtime_success.Year)

success_chart.update_traces(textposition="outside", textinfo="percent+label")

success_chart.show()
In [71]:
plt.figure(figsize=(8, 4), dpi=200)
plt.title('Comparing Success and Failure per Year', fontsize=18)
ax1 = plt.gca()
ax2 = ax1.twinx()

ax1.grid(color='grey', linestyle='--')

ax1.set_ylabel("Success", fontsize=14, color='skyblue')
ax2.set_ylabel("Failure", fontsize=14, color='crimson')

ax1.set_xlim([clean_df.Year.min(), clean_df.Year.max()])

ax1.plot(overtime_success.Year, overtime_success["Mission_Status"], color='skyblue', linewidth=3)
ax2.plot(overtime_fail.Year, overtime_fail["Mission_Status"], color='crimson', linewidth=2, linestyle='--')
plt.show()

For Every Year Show which Country was in the Lead in terms of Total Number of Launches up to and including including 2020)¶

Do the results change if we only look at the number of successful launches?

In [72]:
lead = clean_df.groupby(by=["Country", "Year"], as_index=False).agg({"Mission_Status": pd.Series.count})
lead = lead.sort_values(by="Year", ascending=False)
lead.head()
Out[72]:
Country Year Mission_Status
177 USA 2020 20
108 New Zealand 2020 3
104 Kazakhstan 2020 6
86 Japan 2020 3
128 Russian Federation 2020 1
In [73]:
lead.tail()
Out[73]:
Country Year Mission_Status
133 USA 1968 10
132 USA 1967 8
131 USA 1966 3
130 USA 1965 2
129 USA 1964 2
In [ ]:
 
In [74]:
s_lead = success.groupby(by=["Country", "Year"], as_index=False).agg({"Mission_Status": pd.Series.count})
s_lead = s_lead.sort_values(by="Year", ascending=False)
s_lead.head()
Out[74]:
Country Year Mission_Status
170 USA 2020 19
121 Russian Federation 2020 1
28 China 2020 15
45 France 2020 2
98 Kazakhstan 2020 6
In [75]:
s_lead.tail()
Out[75]:
Country Year Mission_Status
126 USA 1968 9
125 USA 1967 6
124 USA 1966 3
123 USA 1965 2
122 USA 1964 1
In [76]:
lead_c = px.line(lead, x="Year", y="Mission_Status", color="Country", hover_name="Country")

lead_c.update_layout(xaxis_title="Year", yaxis_title="Number of Launches")
lead_c.show()
In [77]:
s_lead_c = px.line(s_lead, x="Year", y="Mission_Status", color="Country", hover_name="Country")

s_lead_c.update_layout(xaxis_title="Year", yaxis_title="Number of Success Launches")
s_lead_c.show()

Create a Year-on-Year Chart Showing the Organisation Doing the Most Number of Launches¶

Which organisation was dominant in the 1970s and 1980s? Which organisation was dominant in 2018, 2019 and 2020?

In [78]:
org_lead = clean_df.groupby(by=["Organisation", "Year"], as_index=False).agg({"Mission_Status": pd.Series.count})
org_lead = org_lead.sort_values(by="Year", ascending=False)
org_lead.head()
Out[78]:
Organisation Year Mission_Status
134 MHI 2020 2
17 Arianespace 2020 4
266 VKS RF 2020 1
245 ULA 2020 4
231 SpaceX 2020 13
In [79]:
org_lead.tail()
Out[79]:
Organisation Year Mission_Status
249 US Air Force 1967 7
141 NASA 1967 1
248 US Air Force 1966 3
247 US Air Force 1965 2
246 US Air Force 1964 2
In [80]:
org_lead_c = px.line(org_lead, x="Year", y="Mission_Status", color="Organisation", hover_name="Organisation")

org_lead_c.update_layout(xaxis_title="Year", yaxis_title="Number of Launches by Organisation")
org_lead_c.show()